7 research outputs found

    Placement for fast and reliable through-silicon-via (TSV) based 3D-IC layouts

    Get PDF
    The objective of this research is to explore the feasibility of addressing the major performance and reliability problems or issues, such as wirelength, stress-induced carrier mobility variation, temperature, and quality trade-offs, found in three-dimensional integrated circuits (3D ICs) that use through-silicon vias (TSVs) at placement stage. Four main works that support this goal are included. In the first work, wirelength of TSV-based 3D ICs is the main focus. In the second work, stress-induced carrier mobility variation in TSV-based 3D ICs is examined. In the third work, temperature inside TSV-based 3D ICs is investigated. In the final work, the quality trade-offs of TSV-based 3D-IC designs are explored. In the first work, a force-directed, 3D, and gate-level placement algorithm that efficiently handles TSVs is developed. The experiments based on synthesized benchmarks indicate that the developed algorithm helps generate GDSII layouts of 3D-IC designs that are optimized in terms of wirelength. In addition, the impact of TSVs on other physical aspects of 3D-IC designs is also studied by analyzing the GDSII layouts. In the second work, the model for carrier mobility variation caused by TSV and STI stresses is developed as well as the timing analysis flow considering the stresses. The impact of TSV and STI stresses on carrier mobility variation and performance of 3D ICs is studied. Furthermore, a TSV-stress-driven, force-directed, and 3D placement algorithm is developed. It exploits carrier mobility variation, caused by stress around TSVs after fabrication, to improve the timing and area objectives during placement. In addition, the impact of keep-out zone (KOZ) around TSVs on stress, carrier mobility variation, area, wirelength, and performance of 3D ICs is studied. In the third work, two temperature-aware global placement algorithms are developed. They exploit die-to-die thermal coupling in 3D ICs to improve temperature during placement. In addition, a framework used to evaluate the results from temperature-aware global placements is developed. The main component of the framework is a GDSII-level thermal analysis that considers all structures inside a TSV-based 3D IC while computing temperature. The developed placers are compared with several state-of-the-art placers published in recent literature. The experimental results indicate that the developed algorithms help improve the temperature of 3D ICs effectively. In the final work, three block-level design styles for TSV-based die-to-wafer bonded 3D ICs are discussed. Several 3D-IC layouts in the three styles are manually designed. The main difference among these layouts is the position of TSVs. Finally, the area, wirelength, timing, power, temperature, and mechanical stress of all layouts are compared to explore the trade-offs of layout quality.PhDCommittee Chair: Lim, Sung Kyu; Committee Member: Bakir, Muhannad; Committee Member: Kim, Hyesoon; Committee Member: Mukhopadhyay, Saibal; Committee Member: Swaminathan, Madhava

    Block-level Designs of Die-to-Wafer Bonded 3D ICs and Their Design Quality Tradeoffs

    Get PDF
    Abstract-In 3D ICs, block-level designs provide various advantages over designs done at other granularity such as gate-level because they promote the reuse of IP blocks. In this paper, we study block-level 3D-IC designs, where the footprint of the dies in the stack are different. This happens in case of die-to-wafer bonding, which is more popular choice for near-term low-cost 3D designs. We study design quality tradeoffs among three different ways to place through-silicon vias (TSVs): TSV-farm, TSV-distributed, and TSV-whitespace. In our holistic approach, we use wirelength, power, performance, temperature, and mechanical stress metrics to conduct comprehensive comparative studies on the three design styles. In addition, we provide analysis on the impact of TSV size and pitch on the design quality of these three styles

    Impact of Mechanical Stress on the Full Chip Timing for Through-Silicon-Via-based 3-D ICs

    Get PDF
    Abstract-In this paper, we study the impact of throughsilicon-via (TSV) and shallow trench isolation (STI) stress on the timing variations of 3-D IC. We also propose the first systematic TSV-STI-stress-aware timing analysis and show how to optimize layouts for better performance. First, we generate a stress contour map with an analytical radial stress model for TSV. We also develop a stress model for STI from finite element analysis results. Then, depending on geometric relation between TSVs, STI, and transistors, the tensile and compressive stresses are converted to hole and electron mobility variations. Mobility-variation-aware cell library and netlist are generated and incorporated into an industrial engine for timing analysis of 3-D IC. We observe that TSV stress and STI stress interact with each other, and rise and fall time react differently to stress and relative locations with respect to both TSVs and STIs. Overall, TSV-STI-stress-induced timing variations can be as much as ±15% at the cell level. Thus, as an application to layout optimization, we exploit the stress-induced mobility enhancement to improve performance of 3-D ICs. We show that stress-aware layout perturbation could reduce cell delay by up to 23.37% and critical path delay by 6.67% in our test case

    A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout *

    No full text
    ABSTRACT Through-Silicon-Via (TSV) is the enabling technology for the finegrained 3D integration of multiple dies into a single stack. These TSVs occupy non-negligible silicon area because of their sheer size. This significant silicon area occupied by the TSVs and the interconnections made to the TSVs greatly affect area, power, performance, and reliability of 3D IC layouts. Well-managed TSVs alleviate congestion, reduce wirelength, and improve performance, whereas excessive TSVs not only increase the die area, but also have negative impact on many design objectives. In this paper, we study the impact of TSV on various aspects of 3D layouts. We use GDSII layouts of 2D and 3D designs, and thoroughly compare the pros and cons of TSV usage. We propose a new force-directed 3D gate-level placement that efficiently handles TSVs. In addition, we present an algorithm that assigns TSVs to nets to complete routing that involves TSVs. This algorithm, together with our 3D placer, is integrated into a commercial P&R tool to generate fully validated GDSII layouts. Our experiments based on synthesized benchmarks indicate that our algorithms help generate GDSII layouts of 3D designs that are optimized in terms of area, wirelength, and metal layer count

    Fast bidirectional shortest path on GPU

    No full text

    High performance application specific stream architecture for hardware acceleration of HOG-SVM on FPGA

    No full text
    Conventional sequential processing on software with a general purpose CPU has become significantly insufficient for certain heavy computations due to the high demand of processing power to deliver adequate throughput and performance. Due to many reasons a high degree of interest could be noted for high performance real time video processing on embedded systems. However, embedded processing platforms with limited performance could least cater the processing demand of several such intensive computations in computer vision domain. Therefore, hardware acceleration could be noted as an ideal solution where process intensive computations could be accelerated using application specific hardware integrated with a general purpose CPU. In this research we have focused on building a parallelized high performance application specific architecture for such a hardware accelerator for HOG-SVM computation implemented on Zynq 7000 FPGA. Histogram of Oriented Gradients (HOG) technique combined with a Support Vector Machine (SVM) based classifier is versatile and extremely popular in computer vision domain in contrast to high demand for processing power. Due to the popularity and versatility, various previous research have attempted on obtaining adequate throughput on HOG-SVM. This research with a high throughput of 240 FPS on single scale on VGA frames of size 640x480 out performs the best case performance on a single scale of previous research by approximately a factor of 3-4. Further it's an approximately 15x speed up over the GPU accelerated software version with the same accuracy. This research has explored the possibility of using a novel architecture based on deep pipelining, parallel processing and BRAM structures for achieving high performance on the HOG-SVM computation. Further the above developed (video processing unit) VPU which acts as a hardware accelerator will be integrated as a co-processing peripheral to a host CPU using a novel custom accelerator structure with on chip buses in a System-On-Chip (SoC) fashion. Th

    Application specific architecture for hardware accelerating HOG-SVM to achieve high throughput on HD frames

    No full text
    Computer Vision is an emerging field with diverse applications which encompasses many algorithms with heavy computations. Histogram of Oriented Gradients-Support Vector Machine (HOG-SVM) is one such versatile algorithm used for object detection and image classification despite it's heavy computation load. Processing such an algorithm in real time with adequate throughput is a challenging task for a general purpose processor. Moreover, an embedded CPU with very limited processing power could least cater such heavy processing. Therefore our research in general focuses on developing application specific architectures for hardware acceleration of computer vision algorithms. This paper presents a continuation of a series of research to hardware accelerate HOG-SVM algorithm on FPGA. In this paper we mainly present the high performance application specific architecture for hardware acceleration of HOG-SVM which was successful in achieving a high throughput of 240fps on HD frames of size 1920x1080 which is a significant improvement of performance compared to previous research. On the other-hand, both hardware utilization and power consumption are minimized. A mechanism based around Block RAM (BRAM) structures and deep pipelining are used as the key architectural techniques of achieving high performance. The proposed design was deployed on Zynq 7000 FPGA platform which contains a hardwired ARM CPU along with the programmable FPGA fabric. The accelerator is deployed on the FPGA and integrated with the ARM CPU using AXI memory interfaces. A hardware thread model and bare-metal device drivers were developed which encapsulate the behavior of the accelerator as a hardware thread to the applications running on the ARM CPU
    corecore